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ABSTRACT The bacterium Streptococcus pneumoniae is one of the leading causes of fatal infections affecting humans. Intrigu- 
ingly, phylogenetic analysis shows that the species constitutes one evolutionary lineage in a cluster of the otherwise commensal 
Streptococcus mitis strains, with which humans live in harmony. In a comparative analysis of 35 genomes, including phyloge- 
netic analyses of all predicted genes, we have shown that the pathogenic pneumococcus has evolved into a master of genomic 
flexibUity while lineages that evolved into the nonpathogenic S. mitis secured harmonious coexistence with their host by stabi- 
lizing an approximately 15%-reduced genome devoid of many virulence genes. Our data further provide evidence that interspe- 
cies gene transfer between S. pneumoniae and S. mitis occurs in a unidirectional manner, i.e., from S. mitis to S. pneumoniae. 
Import of genes from S. mitis and other mitis, anginosus, and salivarius group streptococci ensured allelic replacements and 
antigenic diversification and has been driving the evolution of the remarkable structural diversity of capsular polysaccharides of 
S. pneumoniae. Our study explains how the unique structural diversity of the pneumococcal capsule emerged and conceivably 
will continue to increase and reveals a striking example of the fragile border between the commensal and pathogenic lifestyles. 
While genomic plasticity enabling quick adaptation to environmental stress is a necessity for the pathogenic streptococci, the 
commensal lifestyle benefits from stability. 

IMPORTANCE One of the leading causes of fatal infections affecting humans. Streptococcus pneumoniae, and the commensal 
Streptococcus mitis are closely related obligate symbionts associated with hominids. Faced with a shortage of accessible hosts, 
the two opposing lifestyles evolved in parallel. We have shown that the nonpathogenic S. mitis secured harmonious coexistence 
with its host by stabilizing a reduced genome devoid of many virulence genes. Meanwhile, the pathogenic pneumococcus evolved 
into a master of genomic flexibility and imports genes from S. mitis and other related streptococci. This process ensured anti- 
genic diversification and has been driving the evolution of the remarkable structural diversity of capsular polysaccharides of 
S. pneumoniae, which conceivably will continue to increase and present a challenge to disease prevention. 
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Streptococcus pneumoniae is a leading cause of pneumonia, 
meningitis, septicemia, and middle ear infections (1). Accord- 
ing to data from the World Health Organization, S. pneumoniae is 
the fourth most frequent cause of fatal infections worldwide (2). 
Intriguingly, the species is not related to other overt streptococcal 
pathogens but clusters within the mitis group of streptococci, 
which otherwise are important members of the commensal mi- 
crobiota of the oral cavity and pharynx (3, 4). The unique patho- 
genic potential of S. pneumoniae among the species of the mitis 
group streptococci is explained by an array of virulence factors 
that provide escape of host immunity, such as the polysaccharide 
capsule and the IgAl protease, in addition to surface-exposed pro- 
teins that enable adhesion to and destruction of host tissues (5, 6). 
In spite of relative conservation of its genome, some pneumococ- 
cal virulence factors show extensive structural diversity that en- 
sures survival of the species after immunity has developed in re- 



sponse to infection or vaccination (5). One example is the capsular 
polysaccharide, which occurs in more than 90 distinct structures, 
encoded by serotype-specific capsular biosynthesis operons (cps), 
which, combined, add up to the same size as the complete pneu- 
mococcal genome (-2.1 Mb) (7). The 13 capsular polysaccharides 
most frequently associated with disease form the basis of a child- 
hood vaccine currently implemented in most industrialized coun- 
tries (8). However, frequent switching of capsular serotype (9-11) 
and the potential emergence of novel structures present a signifi- 
cant challenge to the continued successful prevention of pneumo- 
coccal infections. 

Regulated natural competence for genetic transformation of 
pneumococci combined with induced lysis of noncompetent 
members of the same species enables frequent transfer of patho- 
genicity islands, exchange of complete virulence genes or frag- 
ments of them, and dissemination of antibiotic resistance within 
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the species (12-17). In addition, recombination between S. pneu- 
moniae, Streptococcus mitts, and Streptococcus oralis has been re- 
ported to be instrumental in the development and dissemination 
of resistance to beta-lactam antibiotics (18-20). 

We previously proposed an evolutionary model suggesting 
that the species S. pneumoniae, S. mitis, and the more recently 
described Streptococcus pseudopneumoniae arose from a 
pneumococcus-like organism pathogenic to the immediate ances- 
tor of hominids (3). Being almost exclusively adapted to humans 
and other hominids, their success conceivably is closely associated 
with the population size of susceptible hosts. Here we present 
evidence supporting this evolutionary model and demonstrate the 
genetic basis of how a dichotomy of distinct but successful bacte- 
rial lifestyles evolved in parallel within their host. The pathogenic 
lifestyle of the pneumococcus, dependent on continued import of 
genes from neighboring species, results in antigenic diversity that 
will continue to challenge the prevention of pneumococcal infec- 
tions. 

RESULTS 

Phylogenetic relationships based on core genome sequences. To 

shed light on the genetic processes that shaped the genomes of 
S. pneumoniae and its close commensal relatives, we explored new 
genomic information. Alignment of 35 genomes of S. pneu- 
moniae, S. mitis, S. pseudopneumoniae, S. oralis, and Streptococcus 
infantis (see Table SI in the supplemental material) identified a 
core of 822,537 nucleotides (nt). The number of polymorphic 
sites within this concatenated sequence was 292,227 (35.5%), of 
which 240,553 sites were parsimoniously informative (i.e., present 
in more than one strain). Phylogenetic reconstruction based on 
these core genome sequences confirmed our previous observa- 
tion, based on selected housekeeping genes (3, 4), that S. pneu- 
moniae is a single lineage in a cluster otherwise composed of S. mi- 
tis, that S. pseudopneumoniae takes up an intermediary position, 
and that all three species are well separated from S. oralis and 
S. infantis (Fig. 1). The average genetic distance of members of the 
S. mitislS. pneumoniaelS. pseudopneumoniae cluster to the desig- 
nated type strain of S. oralis, ATCC 35037, used as a common root, 
is slightly but significantly (P < 0.0001) greater for S. pneumoniae 
(0.001309 ± 0.0002) than for S. mitis (0.001278 ± 0.0008). This 
supports our hypothesis (3) that the S. pneumoniae lineage is the 
phylogenetically most ancient and only recently has been under- 
going a population burst facilitated by the exponentially expand- 
ing human species, its primary host. Spreading vertically (21), 
success of the commensal species is not dependent on the host 
population size. 

Reductive evolution of the S. mitis genome. The previously 
demonstrated sporadic occurrence of recognized S. pneumoniae 
virulence factors in S. mitis strains (3, 22, 23) was confirmed by 
detailed comparison of the gene contents of the 35 genomes. Most 
strikingly, 12 out of 15 S. mitis strains had a complete cps locus in 
the same genomic region as in S. pneumoniae. Likewise, assumed 
virulence factors like IgAl protease and zinc metallopro tease C, 
neuraminidases A and B, autolysin, pneumolysin, several choline- 
binding proteins, and PavA were present in some strains of S. mitis 
and absent in others (Fig. 2). 

To determine if such shared virulence genes represent pneu- 
mococcal genes transferred to S. mitis or genes ancestral to both, 
we generated phylogenetic trees of all predicted genes in S. pneu- 
moniae TIGR4 and orthologs identified in all 35 genomes. In trees 



of virulence genes (one example is shown in Fig. SIA in the sup- 
plemental material), S. pneumoniae formed a tight cluster, 
whereas S. mitis strains formed more diverse lineages in patterns 
congruent with the core genome-based tree (Fig. 1). This proves 
that they are ancestral genes that have been diversifying in parallel 
with other parts of the genome and subsequently were lost by 
some S. mitis strains in a reductive evolutionary process. The loss 
is reflected in the S. mitis genomes being up to 15% smaller than 
those of S. pneumoniae (see Table SI). However, a surprising pro- 
portion (23.6%) of the 1,620 trees generated on the basis of nu- 
cleotide sequences of all genes (excluding transposases and genes 
unique to S. pneumoniae strains) showed clustering of S. pneu- 
moniae genes among S. mitis genes. We interpret this as evidence 
of acquisition by S. pneumoniae strains of homologous gene se- 
quences from strains of S. mitis. Although occasional trees identi- 
fied the source of the gene sequence, the majority of transfers had 
as donors putative S. mitis clones not represented in our sample of 
the undoubtedly large global population of S. mitis (see Fig. SIB). 
The transfers from S. mitis to S. pneumoniae often affected several 
adjacent genes, amounting to sequences spanning from 1 16 bp to 
10,600 bp, in full agreement with the sizes observed in an in vitro 
recombination experiment involving one strain each of S. mitis 
and S. pneumoniae (18). As shown in Table 1 and reflected in the 
phylogenetic tree in Fig. 1, Hungary 19A was the strain of S. pneu- 
moniae that acquired the largest proportion of genes (8.2% of 
genes, corresponding to 141 kb) from S. mitis. S. pseudopneu- 
moniae showed extensive recombination between the S. pneu- 
moniae lineage and S. mitis lineages, reflected in its intermediary 
position in the phylogenetic tree and its admbrture of phenotypic 
traits of the two species (24). While 86% of the genes clustered 
with S. mitis 14% clustered with S. pneumoniae. No clear evidence 
of acquisition by S. mitis strains of gene sequences from S. pneu- 
moniae was detected. However, as previously reported (18-20), 
genes encoding transpeptidases ("peniciUin-binding proteins"), 
gyrase, and adjacent genes (e.g., orthologs of SP_0335, SP_0370, 
SP_0371, SP_1218, and SP_1662-1669) (25) revealed mosaic se- 
quence structures (see Fig. SIC and D). This reflects multiple ho- 
mologous recombination events between S. pneumoniae and 
S. mitis but often without clear traces of the direction of transfers. 

Evolution of capsular polysaccharide diversity in S. pneu- 
moniae. Next, we tested the hypothesis that import of genes ex- 
plains the extreme structural diversity of capsular polysaccharides 
in S. pneumoniae {n = 95), which has remained an enigma. The 
pneumococcal cps operons consist of 12 to 22 genes directly in- 
volved in synthesis and transport of the polysaccharides (7). 
Among these, the glycosyl transferases, glycosyl phosphotrans- 
ferases, dehydrogenases, mutases, and epimerases are often 
unique to one or more serotypes and determine the distinct poly- 
saccharide structure (26). We aligned each protein (« = 1575) 
encoded by the cps locus of the S. pneumoniae serotypes (7) to the 
NCBI nonredundant protein database. This provided evidence of 
extensive import of cps operon genes from numerous Streptococ- 
cus species, including other members of the mitis group (S. mitis, 
"Streptococcus mitis biovar 2," S. oralis, S. infantis. Streptococcus 
sanguinis, Streptococcus parasanguinis, and Streptococcus peroris) 
and members of the more distant anginosus and salivarius groups. 
The number of genes imported from a single or several different 
donor species ranged from one gene to the entire cps locus (see 
Table S2 in the supplemental material). Imported genes included 
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FIG 1 Phylogenetic tree of Streptococcus strains included in the study. The tree, generated by the minimum-evolution algorithm in MEGA version 5.2, was based 
on 822,537-nt sequences shared by all 35 genomes listed in Table Si in the supplemental material. It illustrates that S. pneumoniae is a single lineage in a cluster 
otherwise composed of S. mitis and that S. pseudopneumoniae occupies an intermediary position. The bar represents the genetic distance. 



genes that were part of a cps operon in the donor, as well as genes 
with other glycosylation functions outside the cps locus. 

The nucleotide identity between the putative donor and recip- 
ient cps genes ranged from 84 to 99%, presumably reflecting the 
time elapsed since the genetic transfer and/or the existence of do- 
nors not represented among the genome-sequenced streptococci. 
For instance, the membrane-associated flippase, responsible for 
transferring the oligosaccharide chains to the exterior of the pneu- 
mococcal membrane, is common to all pneumococcal cps operons 
except serotype 3 (7, 26). The genetic diversity among pneumo- 
coccal flippase genes is -16 times larger than the overall diversity 
of the pneumococcal core genome (0.280% ± 0.056% versus 
0.017% ± 0.003%), in support of a diverse origin of the gene 
among pneumococci (see Fig. S2 in the supplemental material). 



Alignments revealed >98% amino acid identities of flippases of 
several pneumococcal serotypes to those of strains of a range of 
Streptococcus taxa that are otherwise genetically more distant. The 
assumed direction of transfer was further supported by two gene- 
based observations. First, comparison of the genetic distance be- 
tween genes from clonally independent strains of the respective 
serotypes of S. pneumoniae showed more conservation than 
among strains of donor species (Fig. 3). Second, several genes 
intact in the putative donor were pseudogenes in pneumococci. 
Comparison of serotypes belonging to the same serogroup (e.g., 
serogroups 7, 18, and 19) revealed that mutations resulting in 
pseudogenes, in some cases combined with import of additional 
genes from other donors or complete deletion of genes, have been 
driving the structural diversification within serogroups (see 
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FIG 2 Comparative analysis of tiie gene contents of 35 genomes of S. pneu- 
moniae, S. mitis, S. pseudopneumoniae, and S. oralis. The genome of S. pneu- 
moniae TIGR4 served as a reference. Green indicates the presence and red the 
absence of genes. Transposase genes are indicated by light-blue horizontal 
lines on the right. The figure illustrates the strain-specific reductive evolution 
of S. mitis genomes, resulting in gene loss to various extents, including genes 
encoding virulence properties in S. pneumoniae. 



TABLE 1 Numbers of gene replacements in S. pneumoniae strains 
imported from S. mitis 

No. (%) of 
genes imported 



Strain 


Serotype 


from S. mitis" 


Hungary 1 9A 


1 Q A 


I J J \o.A } 


Taiwanl9F 


19F 


88 (5.4) 


CGSP14 


14 


85 (5.3) 


ATCC 700669 


23F 


72 (4.4) 


P1031 


1 


71 (4.4) 


TIGR4 


4 


61 (3.8) 


TCH8431 


19A 


60 (3.7) 


670 


6B 


56 (3.5) 


JJA 


14 


56 (3.5) 


G54 


19F 


45 (2.8) 


70585 


5 


38 (2.4) 


D39 


2 


28 (1.7) 


R6 


Rough 2 


27 (1.7) 



" Based on analysis of 1,620 annotated genes shared by S. pneumoniae TIGR4 and other 
isolates. 



Fig. S3). As an example, an evolutionary model for the origin and 
diversification of the S. pneumoniae serogroup 19 is presented in 
Fig. 4. 

Parallel evolution of genome plasticity and genome stability. 

These findings indicate that interspecies gene transfer between 
S. pneumoniae and neighboring species is unidirectional, i.e., from 
other species to S. pneumoniae. This is supported by further ob- 
servations. Competence for genetic transformation in pneumo- 
cocci depends on 22 genes (27). Screening of the 35 genomes 
identified all 22 genes in all genomes of S. pneumoniae and S. pseu- 
dopneumoniae, whereas only 10 out of 15 S. mitis strains and none 
of the S. oralis strains possessed all. Up to 3 of the 22 essential genes 
were missing or significantly truncated in some strains (see Ta- 
ble S3 in the supplemental material) , suggesting reduced or lack of 
transformation competence. 

Several other genes facilitate the incorporation of foreign ge- 
netic elements in the pneumococcal genome. S. pneumoniae 
strains possess one of two complementary Dpn restriction- 
modification systems, Dpnl or DpnII (28), that are part of the 
competence (com) regulon. It was recently demonstrated that in- 
duction of this system is necessary for optimal pathogenicity is- 
land transfer (29). While present in all S. pneumoniae strains, the 
majority of S. mitis strains lacked intact dpn loci (see Fig. S4 and 
Table SI in the supplemental material). The genes were either 
missing in this location and in other parts of the genome or re- 
placed by other genes, such as a transposase and an integrase in 
strain SK667. When present, alignments of the S. mitis Dpn locus 
genes with those of S. pneumoniae showed that they are ancestral 
genes diversified in paraEel with other parts of the respective ge- 
nomes. Interestingly, a third version of the locus (here termed 
DpnIII) was demonstrated in S. pneumoniae ATCC 700669, S. mi- 
tis SK578, and S. oralis ATCC 35037. In these strains, two genes on 
opposite strands and encoding a restriction enzyme resembling 
MutH of Escherichia coli and a DNA (cytosine-5-)- 
methyltransferase family protein constituted the locus. Other 
strains of S. oralis lacked Dpn locus genes. These observations 
suggest that the Dpn-associated function is under deterioration in 
S. mitis and S. oralis. 

Transposases are widely used in bacteria to facilitate intra- and 
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"S. mitis biovar 2" F407 (Taxon 058) 
"S. mitis biovar 2" SK95 



100 100 99.7 99.9 100 100 100 100 100 100 100 100 100 100 100 100 



94.0 96.0 94.4 94.1 83.9 99.6 99.8 



2.3 97.5 85.1 84.2 82.8 78.4 



96.7 97,1 96.7 94.2 94.2 84.9 



9.5 99.4 99.3 100 



6.2 97.2 87.0 88.9 91.3 94.3 



Choline-binding amiA 
protein A 



Gene key for cps operon genes: 

^^^^^H Regulatory genes ■ UDP-gliicose 6-dehydrogenase 

I I Initial transferase I I UDP-glucose/GDP-mannosedehydrogenase 

I I Glycosyl transferase I I dTDP-L-rhamnose pathway gene 

I ~l Polymerase ^^^^^m Flanking gene 

I I Flippase I I aliB and a//6-like genes 

I I Transposase gene 

FIG 3 Example of comparisons and genetic distances of cps locus genes among S. pneumoniae and S. mitis strains. Tfie nucleotide sequence identity (%) of 
orthologous genes in S. pneumoniae serotype 2 and "S. mitis biovar 2" strains are shown for each flanking pair. Clonally independent strains of the respective 
serotypes of S. pneumoniae showed more conservation than strains of donor species, supporting the proposed transfer from S. mitis to S. pneumoniae. 



interstrain mobility of genes or islands of genes (30). The 13 
S. pneumoniae strains possessed from 19 to 111 (median, 77) such 
elements distributed over the entire genome (Fig. 2), although 
some are degenerate, in agreement with their constant adaptation 
to the transforming genome. Notably, transposases are associated 
with cps operons of all pneumococcal serotypes, in most cases 
flanking the entire operon (7). Although most S. mitis and all 
S. oralis strains examined had complete cps operons, none in- 
cluded transposases. In general, S. mitis and S. oralis genomes 
harbored significantly fewer transposase genes (median number, 
8) (see Table SI in the supplemental material). One exception was 
S. mitis strain B6 (31), which in several ways, including the ge- 
nome size, is exceptional among S. mitis strains. 

Like transposases, repeat elements, including RUP (repeat 
units of pneumococcus) are assumed to facilitate genomic plastic- 
ity in addition to phase variation of genes (32, 33). In addition to 
facilitation of traditional homologous recombination, a recent re- 
port demonstrated that pneumococci can also generate diversity 
by transformation with fully homologous "self DNA by generat- 
ing a variety of merodiploids within a population facilitated by 
alternative pairing of repeat regions present in different parts of 
the genome (34). Analysis of the 35 genomes showed that pneu- 
mococcal genomes had 53 to 63 RUP elements, including one or 
two within the cps locus of all serotypes (except serotypes 5, 11a, 
and 23b), while S. mitis strains had either none or no more than 
three elements in the entire genome (see Table SI in the supple- 
mental material). 

Bacterial defense systems against attack by foreign DNA in- 
clude the clustered, regularly interspaced short palindromic re- 
peat (CRISPR) loci. In agreement with a recent report (29), none 
of the S. pneumoniae possessed CRISPR sequences. This corrobo- 
rates the finding that CRISPR loci artificially inserted into a pneu- 
mococcal genome were spontaneously ejected when under envi- 



ronmental stress (35). Likewise, the S. pseudopneumoniae strains 
did not possess CRISPR/Cas systems. However, 5 of the 15 S. mitis 
strains and 4 of the 5 S. oralis strains possessed CRISPR sequences 
(see Table SI in the supplemental material). A few of the spacers 
showed sequence similarity to bacteriophage/prophage se- 
quences, most of which are Streptococcus specific and in some 
cases are integrated in S. pneumoniae and S. pseudopneumoniae 
genomes (not shown). 

DISCUSSION 

One factor in the coevolution of obligate symbionts of humans 
that has so far received little attention is the impact of the suscep- 
tible host population size. This factor is of particular importance 
in pathogenic (i.e., parasitic) species that induce immunity or 
sometimes death, leaving the host nonaccessible for repeated col- 
onization. Thus, successful survival of the pathogen requires a 
sizeable host population of sufficient density to allow spread be- 
tween susceptible hosts and/or a capacity of the pathogen for con- 
stant antigenic change. In contrast, commensals that achieve a 
mutualistic lifestyle induce a tolerogenic response in the host's 
immune system, allowing continued colonization and intimate 
and potentially lifelong association (36). Many species of the ge- 
nus Streptococcus are almost exclusively adapted to humans and 
other hominids. S. pneumoniae is one of the most important 
pathogens affecting humans (2). Although it is a widespread col- 
onizer particularly of children in day care centers, both coloniza- 
tion and infection result in rapid elimination (median duration, 
19 days) by antibodies directed to the capsular polysaccharide and 
presumably other surface-exposed antigens (37). In contrast, the 
closely related S. mitis is a lifelong companion of aU humans in the 
upper respiratory tract and is often present as mixed populations 
of multiple clones (38, 39). We have previously demonstrated that 
the two species share an immediate ancestor and have argued that 
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Reimport of distinct wchA and 
wchO from "S. mitis" bv. 2, and 
wzy and wzx (unidentifed source) 



Loss of wchA, wchO, 
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transposase 
genes and RUP 
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S. pneumoniae 
serotype 
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19f 
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Import of entire cps 
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(SK564) 



19c 



FIG 4 Phylogenetic model for acquisition and diversification of the four serogroup 19 S. pneumoniae serotypes. Acquisition of the entire capsular biosyntliesis 
operon from S. mitis {SK564) introduced the serotype 19c capsule in S. pneumoniae. Subsequent incorporation of transposase and RUP sequences in the operon 
facilitated transfer to other strains of S. pneumoniae, in which allelic replacement with selected genes acquired from "S. mitis biovar 2" (this taxon is erroneously 
classified as a biovar of S. mitis), loss of genes, and gene mutations resulted in the structurally distinct capsular polysaccharides of serotypes 19b, 19a, and 19f. A 
detailed comparison of the eps operons of the four serogroup 19 serotypes is shown in Fig. S4 in the supplemental material. 



the ancestor was a pneumococcus-like species presumably patho- 
genic to the immediate ancestor of hominids (3). The genome- 
based data obtained in this study support this model. Our results, 
furthermore, illustrate how a significant selection pressure result- 
ing from a shortage of potential hosts (40) was handled by the S. 
pneumococcus-S. mitis-S. pseudopneumoniae ancestor in two op- 
posing ways occurring in parallel. S. pneumoniae maintained its 
pathogenic potential, which facilitates horizontal spread, and op- 
timized its genome plasticity (17). In contrast, harmonious coex- 
istence by the majority of lineages becoming S. mitis was achieved 
by elimination of properties that challenge the host combined 
with increased genome stability (i.e., partial loss of competence 
genes, transposases, repeat elements, and the Dpn restriction- 
modification system, combined with acquisition of CRISPR/Cas 
sequences). Interestingly, these S. mitis lineages are now highly 
diverse and, according to traditional taxonomic standards, would 
represent separate species (3). An important factor in this diver- 
sification process has been the ecological and genetic isolation of 
clones colonizing distinct lineages of human hosts combined with 
a vertical spreading pattern. Our demonstration of various levels 
of loss of virulence-associated factors and properties contributing 
to genome plasticity among the examined strains of S. mitis (Fig. 2; 
see also Table SI in the supplemental material) indicates that this 
is an ongoing process brought to different degrees of completion 
by individual S. mitis lineages. Future studies may reveal if this is 
reflected in the occasional ability of S. mitis strains to cause bacte- 
remia or endocarditis in groups of predisposed patients (41, 42). 
Another result of the need of S. pneumoniae to expand its ecolog- 
ical niche may be the adaptation of certain clones to an equine 
host, which also included loss of virulence-associated genes (43). 



Availability of a critical population of potential hosts (40) be- 
came an evolutionary bottleneck to the pathogen, reflected in the 
significant homogeneity of the core genome of today's pneumo- 
coccus (Fig. 1). In addition to the expression of crucial virulence 
properties, life as a pathogen of the S. pneumoniae lineage required 
optimal genome plasticity, enabling antigenic diversity of surface 
structures. For example, the relative sequence diversification of 
the paralogous zinc metalloproteases IgAl protease, ZmpB, and 
ZmpD is striking evidence of significantly enhanced selection for 
diversification of surface-exposed proteins in the pathogen 
S. pneumoniae compared to the closely related commensal strep- 
tococci (16). In addition to homologous recombination within 
the population of pneumococci, our results show that the need for 
diversification was remarkably solved by its continued exploita- 
tion of the gene pool of neighboring species. In some S. pneu- 
moniae strains, up to 9% of the alleles of genes were imported 
from S. mitis (Table 1 ) . This is an ongoing process facilitated by its 
colonization of an ecological niche, albeit briefly, where it fre- 
quently meets multiple members of related commensal species 
that serve as a genetic toolbox. Most remarkable is our finding that 
the previously enigmatic diversity of capsular polysaccharide 
structures expressed by S. pneumoniae is a direct result of gene 
import from several species of commensal streptococci, including 
S. mitis, the "S. mitis biovar 2" (mislabeled since it is more closely 
related to S. oralis [4]), S. oralis, S. infantis, S. sanguinis, S. para- 
sanguinis, S. peroris, and members of the more distant anginosus 
and salivarius groups (see Table S3 in the supplemental material). 
In several serotypes, complete cps loci had been imported from a 
single donor, in some cases in several independent steps. In others, 
a mosaic of genes imported from distinct donors was evident. 
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Contributing to the diversification that constitutes distinct sero- 
types belonging to the same serogroup (e.g., serogroups 7, 18, and 
19) were mutations resulting in pseudogenes, import of additional 
genes from other donors, or complete deletion of genes (Fig. 3; see 
also Fig. S4). This process conceivably will continue to result in 
additional antigenic diversity that may challenge the currently 
successful prevention of pneumococcal infections by vaccination. 

This is the first demonstration of how selective pressures re- 
sulting from a shortage of potential hosts was solved by bacteria in 
two opposing ways occurring in parallel. Harmonious coexistence 
by lineages becoming S. mitis was achieved by elimination of 
properties that challenge the host combined with increased ge- 
nome stability. Life as a pathogen of the S. pneumoniae lineage 
required optimal genome plasticity combined with antigenic di- 
versity of surface structures, including capsular polysaccharides, a 
challenge remarkably solved by its continued exploitation of the 
gene pool of neighboring species. More recently, success of the 
S. pneumoniae lineage reflected in the lineage-specific boost of the 
pneumococcus population has been ensured by the dramatic ex- 
pansion of the susceptible host population. 

MATERIALS AND METHODS 

Bacterial genomes. The 35 streptococcal genomes examined in the study 
are listed in Table SI in the supplemental material together with NCBI 
accession numbers. A total of 11 genomes sequenced as a part of this study 
were generated using the 454 platform (GS20, FLX, and/or Titanium) and 
assembled with the Newbler assembler. Details on the libraries con- 
structed, sequencing coverage, and assembler version used are available in 
the GenBank entries. 

Alignment of genomes. A multiple whole-genome nucleotide align- 
ment of contigs or complete chromosomes from the 35 whole genomes 
was generated using the software program Mugsy (44), and clusters of 
syntenic orthologs across the genomes were obtained with Mugsy- 
Annotator (45). 

Phylogenetic analyses. A phylogenetic tree based on the concatenated 
core genome sequences from the Mugsy alignment was generated using 
the minimum-evolution algorithm according to the maximum composite 
likelihood model in the software program MEGA 5.2 (46) and validated 
by bootstrap analysis based on 500 replications. Recombination in se- 
lected genes was visualized using the program SplitsTree 4 (47). 

Bioinformatics tools and analyses. Annotated genome sequences 
from the 35 genomes (see Table SI in the supplemental material) and 
Mugsy-Annotator clusters of syntenic orthologs were loaded into the 
Sybil comparative genomics software package (48) for comparative anal- 
yses. 

To determine the extent of recombination between S. pneumoniae and 
the related commensal species, we aligned nucleotide sequences within 
Mugsy-Annotator clusters and generated minimum-evolution phyloge- 
netic trees in MEGA5.2. A total of 1,620 trees (excluding transposases and 
genes unique to S. pneumoniae strains) were generated and manually ex- 
amined. 

The presence or absence of annotated genes based on Mugsy- 
Annotator clusters was detected in Sybil and confirmed by blastn analysis 
(49). Figure 2 was generated by loading profiles of gene presence and 
absence into the MeV interface (50). RUP (repeated unit of pneumococ- 
cus) elements were identified by searching TIGR4 RUP sequences (32) 
with blastn against the 35 genomes. 

Genetic distances, i.e., the number of base substitutions per site from 
averaging over all sequence pairs, were determined in MEGA5.2 using the 
maximum composite likelihood model (51) based on aligned single genes 
or concatamers of six multilocus sequence type (MLST) genes of S. pneu- 
moniae (52). 

CRISPR regions were identified using the CRISPR finder tool (http: 
//crispr.u-psud.fr) . 



Nucleotide sequence accession numbers. The Whole Genome Shot- 
gun projects have been deposited at DDBl/EMBL/GenBank under the 
following accession numbers: Streptococcus mitis SK137, JPFSOOOOOOOO; 
Streptococcus mitis SK271, IPGWOOOOOOOO; Streptococcus mitis SK1126, 
JPFTOOOOOOOO; Streptococcus mitis SK629, JPFUOOOOOOOO; Streptococcus 
mitis SK667, IPFVOOOOOOOO; Streptococcus mitis SK642, IPFWOOOOOOOO; 
Streptococcus mitis SK637, JPFXOOOOOOOO; Streptococcus mitis SK578, 
JPFYOOOOOOOO; Streptococcus mitis SK608, IPFZOOOOOOOO; Streptococcus 
oralis SK141, IPGAOOOOOOOO; Streptococcus oralis SK143, IPGBOOOOOOOO. 
The versions described in this paper are versions XXXXO 1000000. 

SUPPLEMENTAL MATERIAL 

Supplemental material for this article may be found at http://mbio.asm.org 
/lookup/suppl/doi:10.1128/mBio.01490-14/-/DCSupplemental. 

Figure Si, EPS file, 1.8 MB. 

Figure S2, EPS file, 1.1 MB. 

Figure S3, EPS file, 1.6 MB. 

Figure S4, EPS file, 3.6 MB. 

Table SI, DOCX file, 0.1 MB. 

Table S2, DOCX file, 0.1 MB. 

Table S3, DOCX file, 0.1 MB. 
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